Low-complexity Optimal Scheduling over 
Correlated Fading Channels with ARQ Feedback 

Wenzhuo Ouyang, Atilla Eryilmaz, and Ness B. Shroff 

Department of Electrical and Computer Engineering 
The Ohio State University 
Columbus, OH, 43210 
{ouyangw, eryilmaz, shroff} @ece.osu.edu 

CN . Abstract 

■ We investigate the downlink scheduling problem under Markovian ON/OFF fading channels, where the instantaneous 

' channel state information is not directly accessible, but is revealed via ARQ-type feedback. The scheduler can exploit the 

temporal correlation/channel memory inherent in the Markovian channels to improve network performance. However, designing 
, low-complexity and throughput-optimal algorithms under temporal correlation is a challenging problem. In this paper, we find 

that under an average number of transmissions constraint, a low-complexity index policy is throughput-optimal. The policy uses 
Whittle's index value, which was previously used to capture opportunistic scheduling under temporally correlated channels. Our 
£\J ' results build on the interesting finding that, under the intricate queue length and channel memory evolutions, the importance 

, of scheduling a user is captured by a simple multiplication of its queue length and Whittle's index value. The proposed 

queue- weighted index policy has provably low complexity which is significantly lower than existing optimal solutions. 



I. Introduction 

In wireless networks with randomly fluctuating channels, intelligently scheduling users is critical for achieving high 
network efficiency. Under the assumption that the scheduler possesses accurate instantaneous Channel State Information 
(CSI), maximum-weight scheduling algorithms (e.g., 0~|-|[3]) are known to be throughput-optimal, i.e., no scheduling policy 
' can ensure system stability for arrival rates that are not supportable by a max-weight scheduler. 

| In practice, accurate instantaneous CSI is difficult to obtain at the scheduler. Hence, in this work we consider the important 
(y-) r scenario where the instantaneous CSI is not directly accessible to the scheduler, but is instead revealed through ARQ-type 
^sO ! feedback only after each scheduled data transmission. 

The time-correlation or channel memory inherent in the fading channels can be exploited by the scheduler for more 
\Q ' informed decisions, and hence to obtain large throughput gains (e.g., HfS)). In this paper, we incorporate the temporal 

■ correlation by modeling the fading channels as Markov-modulated ON/OFF processes. 

Under imperfect CSI, channel memory, and limited network resources, designing throughput-optimal scheduling schemes is 
highly challenging. This is because the scheduler needs to optimally balance the intricate 'exploitation-exploration tradeoff, 
i.e., to decide whether to exploit the channels with more up-to-date CSI, or to explore the channels with outdated CSI. The 
packets destined to each user are stored in a corresponding data queue before transmission. Due to this temporal correlation 
and imperfect ARQ-based CSI, to develop throughput-optimal scheduler requires a complex characterization of the interplay 

■ between user scheduling, channel memory evolution and queue evolution. Therefore, traditional Lyapunov drift minimization 
technique do not apply in this context. 

Under the aforementioned complications, traditional Dynamic Programming based approaches can be used, but are 
intractable due to the well-known 'curse of dimensionality'. In related works EH, a simple round-robin based scheduling 
policy is shown to possess the throughput-optimality property. However, such a scheme is only optimal in the regime of 
a large number of users with identical Markovian channel statistics. In QH), a throughput-optimal frame-based policy is 
proposed. This policy relies on solving a Linear Programming in each frame, which is hindered by the curse of dimensionality, 
where the computational complexity grows exponentially with the network size. 

In this work, we study throughput-optimal downlink scheduling under imperfect CSI over heterogeneous Markovian fading 
channels. We assume that each user occupies a dedicated channel, i.e., all users can transmit simultaneously, but the long-term 
average number of transmissions is limited. Such a constraint can be used to limit the energy consumption or interference 
effect depending on the context. An example to limit the energy consumption is the green cellular networks (e.g., l9l- lfTT1 ). 
It is estimated that the cellular base stations consume 4.5 GW of power globally, which corresponds to more than 40 million 
metric tons of CO2 emission and over $10 billion electricity bill annually ll9l lfT0l . With energy expenditure rising by 15- 
20% each year, an important objective in green cellular networks design is to reduce the long-run average number of data 
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Fig. 1. Two state Markov Chain model. 

transmissions to decrease energy consumption ifTO) . Therefore, it is of great interest to understand the relationship between 
the achievable throughput region and the constraint on the long-term average number of transmissions. In the meanwhile, 
restricting the average number of transmissions also helps to reduce interference between concurrent transmissions in the 
network. Specifically, our contributions are as follows: 

• Under the constraint on the average number of transmissions, we propose a low-complexity throughput-optimal policy. 
The policy operates over separate time frames, where the per-frame computational complexity is at most 0(N log N) with 
the number of users N. Therefore, the policy does not suffer from the curse of dimensionality. 

• The proposed policy builds on Whittle's index analysis of Restless Multi-armed Bandit Problem JT2], where Whittle's 
index value is used to measure the importance of scheduling a user under the time-correlated channel [1131 . We find that, 
interestingly, under the coupled queue length and channel memory evolution, the importance of scheduling a user is measured 
by a simple multiplication of the queue length and Whittle's index value. 



II. System Model 

A. Downlink Scheduling Problem 

We consider a time-slotted wireless downlink network with one base station and N users, where each user i occupies a 
dedicated wireless channel. The channel state of user i, denoted by Ci [t] at slot t, evolves according to an ON/OFF Markov 
chain across time slots within the state space S = {0, 1}, independently across channels. When the channel is in state "1", 
one packet can be successfully transmitted, otherwise no packet can be delivered Q. As shown in Fig. Q] the channel state 
evolution is represented by the transition probabilities 

Ki :=Pr (Ci[i]=l|Ci[t-l]=l), 
p' 01 :=Pr (Q[t]=lja[t-1]=0). 

We assume that the Markovian channels are positively correlated, i.e., p l n > p l 01 for i=l, 2, ■ ■ ■ ,N, which has been 
commonly used to model the wireless channels in slow fading environment (e.g., 151 fPH ). 

At the beginning of each time slot, the scheduler chooses users for data transmission. The scheduling decisions are 
made without the exact knowledge of the channel state in the current slot. Instead, the accurate ON/OFF channel state 
of a scheduled user is revealed via ACK7NACK feedback from the receiver, only at the end of each slot following data 
transmission. 

We consider the class $ of (possibly non-stationary) scheduling policies that make scheduling decisions based on the 
history of observed channel states, arrival processes, and scheduling decisions. Under the aforementioned restrictions on 
average energy consumption, the scheduling schemes are subject to the constraint that the long-term average number of 
scheduled transmissions is under M, 



lim sup [ 



T-l N 

EE< 

t=0 i=l 



< M, 



(1) 



where af[t] indicates whether user i is scheduled at slot t under policy cf> <E 

Data packets destined for different users are stored in separate queues before transmission. The queue length for user i is 
denoted by qi[t] at slot t. We assume that the packet arrivals for the i-th user form an i.i.d. process Ai[t] with mean and 
a bounded second moment. Hence, the i-th data queue evolves as (^ [£+!]= max{0, ft^] - ai[£]"Cj[t]}-r-j4j[t]. 



B. Belief Value Evolution 

The scheduler maintains a belief value 7T; [t] for each channel i, defined as the probability of channel i being in state 1 at 
the beginning of i-th slot conditioned on the past channel state observations. The belief values are hence updated according 



'Our results easily generalize to the scenario where multiple packets, different across channels, can be transmitted in state '1' 
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Fig. 2. Belief value evolution. 

to the scheduling decisions and accurate channel state feedbacks as follows, 

iy n if <ii[t] = 1 and d[t] = 1, 

m [t + 1] = I if m [t] = 1 and d [t] = 0, (2) 

lQ»(7T<[t]) ifOi[t]=0, 

where Qj(o;)=a;j>i 1 + (1— a;)Poi ^ s tne belief evolution operator when user « is not scheduled in the current slot. In our setup, 
the belief values are known to be sufficient statistics to represent the past scheduling decisions and feedback 1141 . In the 
meanwhile, the belief value Hi [t] is the expected throughput for user i if it is scheduled in slot t. 

For the i-th user, we use b l c l to denote the state of its belief value when the most recent channel state was observed I 
time slots ago to be c 6 {0, 1}. The closed form expression of b l c . can be calculated from © and is given as follows, 

b i _ Phi-(Pli-Poi) l Poi hl _ Poi + (l"Pii)(pii~Poi)' 

0,1 i+Poi-Mi ' 1,1 i+Poi-Mx 

As depicted in Fig. [2] if the scheduler is never informed of the i-th user's channel state, the belief value monotonically 
converges to the stationary probability b\:=p % ox / {1 +Poi — Pii) °f me channel being in state 1. We assume that the belief 
values of all channels are initially set to their stationary values. It is then clear that, based on (fJJ, each belief value 7T,[t] 
evolves over a countable state space, denoted by Bi={b l s ,b l cl : c€ {0, 1}, / €Z + }. 

C. Network Stability Region and Achievable Rate Region 

We adopt the following definition of queue stability fl6l : queue i is stable if there exists a limiting stationary distribution 
Fi such that lim^oo P(qi[t] < q) = Fi(q). The network stability region A is defined as the closure of the set of arrival rate 
vectors supported by all policies in class <f> that does not lead to system instability while abiding by the constraint ([T). 

In the meanwhile, we define the achievable rate region T as the closure of the set of service rate vectors 7 that can be 
achieved by all policies, i.e., 

1 T_1 

T=Cl{j :30 G $ with 7l = liminf -E[ V m[t] ■ of [t]] , 

t=0 

i = 1, • ■ • ,N, subject to constraint ([l])}, (3) 

where Cl{-} denotes the closure of the set. The rate region is hence a convex set, since, by appropriately randomizing 
between any two policies, all the rate vectors between the corresponding two rate points can be achieved. 

The rate region T corresponds to the expected throughput that can be achieved in the system with infinitely backlogged 
queues. Therefore it provides an upper bound on the stability region A. As we shall see in the following sections, the two 
regions T and A turn out to share the same interior and are, therefore, "equal". 

III. Optimal Policy for Weighted Sum-throughput Maximization 

In this section, we consider a weighted sum-throughput maximization problem. The policy introduced here, which is based 
on scaling the Whittle's index values, not only achieves the transmission rate at the boundary of the achievable rate region 
r, it also plays an important part in the throughput-optimal policy in the next section that stabilizes all arrival rates within 
the system stability region A- the main result of the paper. 
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Fig. 3. Illustration of the achieved rate vector A*(r) under policy <j>*(r,M) with weight vector r. 
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V(r,M)=max lnninf -e[ £ ]T n-^-af [t] 

t=0 i=l 
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A. Weighted Sum-throughput Maximization Problem 

Consider the following weighted sum-throughput maximization problem ^(r, M) for a given r = (ri)fL 1 , where the 
expected service rate for each user i is scaled by a positive factor fj, 

.r-i JV 

t=0 i=l 
T-l N 

1 < M. (5) 

T->oo 

The above problem ^'(r, M) is a constrained Partially Observable Markov Decision Process (CPOMDP) (T31l . Consider the 
optimal policy </>*(r, M) (if it exists) for the problem *(r, M) and let 7*(r) = lim inf yE[ Yh~q ' af {r ' M) [t]] . 
Then, as illustrated in Fig. [3] the achieved rate vector 7*(r) is at the intersection of the achievable rate region T and the 
supporting hyper-plane J$? with normal vector r. We proceed to characterize the optimal policy </>*(r, M). 

Under uniform weights r = 1, an optimal policy for problem >P(1, M) is proposed in ifTJl based on Whittle's indexability 
analysis of Restless Multi-armed Bandit Problem lfl2l . Specifically, for channel i, a closed form Whittle's index value Wi(ir) 
is assigned to each belief state tt E B{. The index value intelligently captures the exploitation-exploration value to be gained 
from scheduling the user at the corresponding belief state 11131 . Details of Whittle's indexability analysis can be found in 
lfl2l lfl3l ifTTl . The closed form expression of the Whittle's index value Wi(ir),7T £ Bi is given as follows |[T3llfT7l . 

W"i(7T)=< 1 " p ii + ( b o, 1 - 6 o. ! + i) i + h o, 1 + i °< l s (g) 

I n — , V1 5 rr^-r- if K < it < v\i 

\ (1-Pii)(H-Poi-Pii)+Poi 8 ~ 11 

It can be observed from (|6) that Wi(n) monotonically increases with tt and satisfies Wi(ir) 6 [0,1]. In the following 
key lemma, we relate the optimal algorithm developed for problem ^(1, M) to the problem ^(r, M) with arbitrary weight 
vector r. The proof of the lemma can be found in Appendix lAl 

Lemma 1. There exists an optimal stationary policy <p*(r,M) for problem ^(r, M) (cf. 0-0), parameterized by a 
threshold lo* and a randomization factor p*£E(0, 1], such that 

(i) The scheduler maintains an r-weighted index value W*{~Ki[t]) = ri ■ Wi(7Tj[t]) for user i. 

( ii) User i is scheduled if W[ [t] ) > w*, and stays idle if W\ (7r,; [t] ) < uj*. If W\ [t] )=U)*, it is scheduled with probability 
P*. 

(Hi) The parameters lu* and p* are such that the long-term average number of transmissions equals M. 

Remark: Interestingly, by multiplying the Whittle's index values Wj(7Tj[t]) with Ti, the optimal policy ^>*(1, M) proposed in 
iTPTl for problem ^(1, M) is extended to solve the more general problem ^(r, M). This property is important for designing 
the low-complexity and throughput-optimal policy in Section IIVI 

B. State Space Truncation 

Recall that the belief value evolves over a countable state space £>; for user i and approaches the stationary value if the 
channel is not active for a long time. This motivates us to consider a truncated version of the belief value evolution whereby 
the belief value of a user is set to its steady state (i.e., its channel state history is entirely forgotten) if the corresponding 
channel has not been scheduled for a long time, say r slots. The finite space truncation not only facilitates more trackable 
analysis, it also provides a close approximation to the countable state space. We let BJ denote the truncated state space for 
the i-th user, i.e., Bl={b l s ,b l cl : c€ {0, 1}, 1 = 1, 2, • • • ,r} and let B T = [£[,-•• ,B T N }. 
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In what follows, we introduce the r-weighted index policy <j) T (r, M) that operates over the truncated state space. We shall 
prove (in Lemma [2]i that, under sufficiently large truncation size, the throughput performance of </> T (r, M) is very close to 
that of (f>*(r,M). 



r-weighted Index Policy (j> T (r, M) 

(1) . At time slot t, user i is scheduled if the r-weighted index value W[(TTi[i\) > uj*, and stays passive if 
W r (7Tj[£]) < uj*. If VF r (7Ti[i]) = uj*, user i is scheduled with probability p*. 

(2) . The parameters uj* and p*, calculated in the initialization phase, are such that the long-term average number 
of transmissions equals M. 



Note that, to implement the policy <p T (r, M), the parameters uj* and p* need to be calculated at the initialization phase. 
We next design the initialization phase based on the observation that the average number of transmissions decreases when 
either the threshold u increases or the randomization factor p decreases lfl3l . Hence, during initialization, we first identify 
the parameter uj* by increasing the threshold u> until the constraint ([TJ is satisfied. Then we select the randomization factor 
p* so that the constraint (Q~|i is strictly satisfied with equality. We let the parameter oti(u), p) denote the expected transmission 
time to user i under a policy with threshold u and randomization factor p. The closed form expression of (Xi(u), p) is derived 
as follows, 

P b o,h+( 1 -P) b o,h+i+( 1 ^>n)( h + 1 ~P) 1 °' h 

Pib'o.h-bD + i-Pli+K ■.f.,_ W r (h i V 

pbS, T + (l-p)bj + (l-PU)(^+l-p) Wl { °°> r> ' (7) 

if u =Wf (bl); 



i(u,p) = < 



p(l-p*n+6*) 



(l+r)(l-pi 1 )+pb| 

if uj>W[(bi) 



We formally introduce the initialization phase next. 



Initialization phase: calculation of uj* and p* 

1. Calculate the r-weighted index value Wf(ni) = r, • Wi(ni) for all Ui £ BJ, i = 1, • • • ,N; 

2. Sort the r-weighted index values of each belief states of all users to a (2r+l)iV-dimensional vector w in increasing 
order. Let cr(k) be the user index corresponding to the fc-th element Wk of vector w. 

3. Let k=l and &i = l,i = 1, • • • ,N. 

4. Calculate the activation time ct lT (k)('Wk, 1) of user er(fe) from (O, and update a^nA = a a (k)(wk, 1). 

5. If Y^iLi&i < M, then ui* = Wk-il calculate the randomization factor p* from (O such that X^o-ffc) ®i + 
a CT ( fe )(a;*,p*) = M; output lo* and p*. Otherwise, let k = k + 1, and go to Step 4. 



Remark: The computational complexity of the initialization phase is dominated by sorting the index values in the second 
step, which has complexity 0((2r + 1)N ■ log ((2t + l)iV)). After initialization, the r-weighted Index Policy (f> T (r,M) 
takes a very simple form: in each slot, schedule a user (possibly with randomization) if its r-weighted index value is above 
a threshold. Therefore, the per-slot computational complexity is O(N). 



We let the value tq be 



T =4max{ —. — — ; —,i=l,---.N\. (8) 



Let V T (r, M) be the weighted sum-throughput under policy </> r (r, M), i.e., 

T-l JV 

V T (r, M)= liminf -e[ V V n-^-at^ [t]\ . (9) 

t=0 i=l 

The next lemma bounds the throughput performance difference between policies 4>*(r, M) and <f> T (r, M). 
Lemma 2. For t > to, the throughput performance difference between the policy </>*(r, M) and T (r, Af) « upper bounded 
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as follows, 

N 

\V{v,M)-V T (r,M)\ <f(r)J2n, (10) 

i=i 

where f{T~)=^2iLi a i(Wi(bQ r)> •"•)» which satisfies /(r)— >0 as r— >oo. 

Proof: We prove this lemma by carefully studying the relationship between the truncation size r and the achieved transmission 
rate. For details, please refer to Appendix |B] ■ 



IV. Frame-based queue-weighted index policy 

In this section, we propose a throughput-optimal scheduling policy that operates over the truncated state space. The policy 
is based on the r-weighted index policy proposed in the last section and has low-complexity. 

A. Throughput Optimal Algorithm 

We divide the time slots into separate time frames of length T, where the fc-th frame includes time slots kT, . . ., (fc+l)T— 1. 
The scheduling decisions in the fc-th frame are made based on the queue length information q[fcT] at the beginning of that 
frame. During the fc-th frame, the policy <f> T (q[kT], M), developed in the last section, is implemented with the queue-weighted 
index values. Formally, the T-frame queue-weighted index policy QWI T (T, M) is introduced next. 

T-Frame Queue- Weighted Index Policy QWI T (T,M) 

The time slots are divided into frames of length T. Within the fc-th frame, the q[fcT] -weighted index policy 
T (q[fcT],M) is implemented for T consecutive slots, over the truncated state space B r . 

The next proposition establishes throughput-optimality of the frame-based queue-weighted index policy. 

Proposition 1. For any e > 0, there exist T' and t such that, if T > T' and t > r' , then for any arrival rate A within 
the achievable rate region T — el, under the T-frame queue-weighted index policy QWIr^, M — e/2): (i) all queues are 
stable, ( ii) the constraint (Q]) on the average number of transmissions is satisfied. 

Proof: We prove the proposition by first establishing the uniform convergence of the finite horizon throughput performance 
in a frame to the infinite horizon throughput. We then apply Lemma Q] to show that the average Lypunov drift in each frame 
is negative, which establishes the throughput-optimality. Details of the proof are given in Appendix ICl ■ 
Remarks: (1) Note that, in Proposition [Tj the parameter M in the queue-weighted index policy is scaled down by e/2. This 
mechanism is needed to guarantee the constraint on the long-term average number of transmission. The details are given in 
the proof. 

(2) In the queue-weighted index policy, a user is scheduled based on its queue-weighted Whittle's index value. This is 
especially interesting because of the following: a simple multiplication of queue length and Whittle's index value captures 
the importance of scheduling a user under two sophisticated system features - the queue evolution and the fundamental 
exploration-exploitation tradeoff. 

(3) Calculation of queue-weighted index value is very simple, which only requires scaling the pre-calculated Whittle's 
index value. Under the queue-weighted index policy, in each frame, the initialization phase of <p T (q[kT],M—e/2) has 
computational complexity 0(N log N), while implementing <fr T (q[kT], M — e/2) over the frame has complexity 0(TN) 
(see the remark in Section IIII-Bb . Accordingly, the per-frame complexity is 0(N log N + TN). Therefore, as the frame 
length T scales up, the per-slot complexity decreases toward 0(N). 

(4) The scheduling decisions are made by comparing each user's own index value to a threshold, independently with other 
users. Hence our policy is also applicable for distributed implementation in uplink scenarios. 

Corollary 1. The achievable rate region T, expressed in (0, is equal to the stability region A. 

Proof: Recall that the achievable rate region T provides an upper bound to the stability region A. Since the previous 
proposition states that the queue-weighted index policy stabilizes arrival rates arbitrarily close to the boundary of the 
achievable rate region T, hence the achievable rate region T and the stability region A share the same interior. Because 
both regions T and A are defined over closure of sets, we have T = A. ■ 

Proposition [TJ requires the state-space truncation size r to be large enough for throughput-optimality in the region F — el. 
We next characterize the relationship between the truncation size t and the size of the corresponding supportable region, 
where, recall that, the expression of tq is given in ©. 
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Proposition 2. (a). I/t>tq, there exist Tq and function g(r) such that, ifT>To,for all arrival rates within the stability region 
A— <?(t)1, under the T-frame queue-weighted index policy QWI T {T, M—g(r)/2), all queues are stable, and constraint §\§ 
on the average number of transmissions is satisfied. 

(b). The function g(r)=3 Xa=i a i{^i{bo r ), l) and satisfies linii-^oo g(r) = 0. 

Proof: In the proof, we used Lemma [2] to bound the throughput performance difference between the truncated scenario and 
the non-truncated case. For details, please refer to Appendix [D] ■ 

Remark: Proposition [2] allows one to upper bound the state-space truncation size r that ensures the throughput-optimality in 
any region A— el, when the frame length T is sufficiently large. We believe that, by implementing the policy with expanding 
frame duration, the dependence on To in Proposition [2] can be removed while preserving the low-complexity. 

V. Conclusion 

In this work, we have studied downlink scheduling problem over Markovian evolving ON/OFF fading channels and 
imperfect instantaneous channel state information. The scheduling decisions are made based on the single-bit ARQ-type 
feedback and the channel memory inherent in the Markovian channels. We propose a throughput-optimal policy that operates 
over time frames and appropriately truncated belief state space. In the proposed policy, the importance of scheduling a user 
is measured by a simple multiplication of the queue length and Whittle's index value. Based on this key observation, we 
develop an index-based policy that is not only throughput-optimal, but also has low-complexity per frame in the network size 
and the truncation level of the belief state space. Most notably, our policy does not suffer from the curse of dimensionality 
that is observed in earlier works in this context. We further identified a closed form relationship between the size of the 
state space truncation and the achievable throughput region, which is important in the practical implementation of our 
low-complexity solution. 
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Appendix A 
Proof of LemmaQ] 

The proof of the lemma is an extension of the proof of Proposition 1 in [13). Consider the problem ^(r, M) with weight 
vector r. The constraint (Q3 can be written in an equivalent form that requires at least N — M channels to be passive on 
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T-1 N 



average, i.e., 

_T-1 N 

1 >N-M. (11) 

' 4=0 i=l 

Associating a Lagrange multiplier cj to the constraint (fTTT i, we have the following Lagrangian function L(0, uj) for problem 

#(r,M), 

T-1 N T-1 JV 

I(0,o>)=lkQmf -E[ £ £ rv^-af [«]] +w- lhninf -e[ £ -w-(TV-M). (12) 

t=0 i=l t=0 i=l 

The dual function D(lS) is defined as D(ui) = max^g^ L(<j>, uj). Following the lines of proof in iTPTl we have 

N 

D(lo) =Y^UZ i (u))+ui(N-M). 



in which Up (uj) is a w-subsidy problem under weight r^, 

1 T_1 

C/f (w) = m^climsup-Ef V [r^-af [t] + lo ■ (1-of [*])] 



(13) 



t=o 



where denotes the set of scheduling policies that activate and idle the user i according to the observed channel history. 
In the above problem (fT3l , for each channel i at belief state 71",, it will receive a reward niTi when it activates, otherwise it 
will receive a subsidy lo for passivity. We let (w) C Bi be the set of belief states for which it is optimal to stay idle. 

Under the unit weight n = 1, it was shown in |17| that the problem is Whittle indexable, i.e., 1} (uj) monotonically 
increases from to Bi as w increase from to oo for each user i. The Whittle's index value Wi(n) is defined as the 
infimum subsidy value for which the belief state ir is at the boundary of i.e., 

Wi(ir) = inf{w : tt e^(u)}. 

It follows from lfT2l|[T3l that, for the cj-subsidy problem under unit weight r-i = 1, the optimal policy is to activate the 
user at time slot t if Wi(ir) > uj, and to stay idle if Wi(n) < u>, with tie breaking arbitrarily if Wi(n) = lo. 

We next extend the optimal algorithm for the w-subsidy problem under unit weight to the general case with arbitrary 
weight rj. An equivalent form of Up(ui) is given as follows, 



UP(w) = n maxlimsup ±e[J2 [^[tjaf [t]+^(l-of [t])] 
' T ->°° t=0 



T-1 

(14) 



Therefore, the optimal solution for the w-subsidy problem (fT3l with weight takes the same form as the optimal solution 
for the ui/ri -subsidy problem with weight 1. Accordingly, the optimal solution takes the following form: a user i is scheduled 
at slot t if Wi(7Ti[i]) > td/ri, and stay idle if Wi(ir) < us/ri, with tie breaking arbitrarily if Wi(ir) = oj/ri. 

We define the r-weighted index value as W[(ir) = n ■ Wj(7r),7r € Bi, i £ {1, • • ■ , N}. The optimal policy for the reward 
maximization problem in (TPfl i is then to activate the user i at time slot t if Wf(ir) > uj, and to stay idle if W[(tt) < ui, 
with tie breaking arbitrarily if W[(ir) — lo. Therefore the dual function value D(lo) can be achieved by a threshold-based 
policy implemented over the r-weighted index values Wj r (7r). We shall denote the policy as <fi(w, p) 

Following the similar proof techniques of Lemma 11 in iTPJI . by appropriately choosing the threshold lo* and the 
corresponding randomization parameter p* (for which each user at the index value lo* activates with probability p*) such 
that the constraint (fl~|i on the average number of transmissions is strictly satisfied with equality, the corresponding policy is 
optimal for the problem ^'(r, M). Denoting such a policy as <p*(r, M), the proposition is proven. 



Appendix B 
Proof of Lemma[2] 

Proof: Recall that, in the non-truncated state space, the optimal policy </>*(r, M) corresponds to the parameter pair (uj*, p*). 
Also suppose that, in the truncated state space, the policy </5 T (r, M) corresponds to the parameter pair (lo t ,p T ). Over the non- 
truncated belief state space under a policy with the parameter pair (uj, p), we let a.i(u), p) denote the expected activation time 
of user i, and let Vi(uj,p) denote the expected transmission rate contributed by user i. Correspondingly, over the truncated 
belief state space, the expected activation time and transmission rate are denoted by oti(u),p) and Vi(uj,p), respectively. 
We proceed with the following lemma that provides key properties of a.i(uj,p) and Vi(ui,p). 

Lemma 3. For a user i, if t > tq, we have 
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(a) fixing us, both ai(us,p) and fi(us,p) increase with p; 

(b) for any two parameter pairs (u>i,pi) and (u>2,p2), 



< 



ai(uii,pi) - a l (w 2 ,/9 2 ) 



Proof: The lemma is proven via detailed study of the closed form relationship between Oi(ui,p) and Ui(us,p). Details of 
the proof are moved to Appendix [E] ■ 

We next prove Lemma [2] under two cases. 

Case (1). If the threshold a;* satisfies W[(b l T ) > us* for all user i. Then by setting us T = us*, p T = p* in the initialization 
phase of policy <p T (r,M), the expected amount of transmissions equals to M. Therefore, the policy <p T (r, M) is equivalent 
with the policy 0*(r,M). Thus \V(t,M)-V t (t,M)\ = 0. 

Case (2). If there exists a user i with W7(6j T ) < us*, we let 6 denote the set = {i : W[(b^ T ) < us*}. Therefore, 



N 



N 



\V(v,M)-V t (t,M)\ = \^2v i (u*,p*)-Y i u i (u r ,p T )\ 

i=l i=l 
< ^ - Vi(iO T ,p T )\ + ^ \Vi(u*,p*) ~ Vi{uJ T ,p T ) 



(15) 



iee i^e 
We first show that, if ^ 0, we have us T > us* or ui T = us* with p T < p* . 

For any user i £ 0, we have ai(ui* , p*) = cti(us*,p*). For user i G 0, we have oti(ui* , p*) = oti(WJ[(b l s ), 1). It can 
be shown from © that ai(W7(6' g ), 1) > ai(W[(b^ T ),0),i € 6. Also, from Lemmata), we have ai(W7(6 0jT ), 0) > 



a.i{oj*,p*),i G 0. Therefore, o<i(u;*,/9*) > a* (a;*, p*), i G and we have X)<=i P*)> Z)»=i /?*) = M. So 

if we implement the policy with threshold parameters (us*,p*) over the truncated belief space, the expected number of 
transmissions will exceed the constraint. Hence, to ensure the constraint of expected transmissions over the truncated state 
space, it must be that ui T > ui* , or u> T = ut* with p T < p* . With this property and from Lemma [3} a), we have 



Vi{w T ,p T ) < Ui(Wi(bi iT ),l) 

v i (us*,p*)<v i {W i (bi T )A) = v i (WM !T ),l) 
Oi(u> T ,p T ) < a l {W l {b\ )T ),l) 

a. t {u:*,p*) < ai(Wi(bi !T ),l) = a^WMj,!) 

Hence from (fT6l> - dTTb . 

\vi(uj*,p*) - Vi(u T ,pr)\ < MWM >T ), i) < n ■ On(Wi{^ tT ), 1), i G e. 

For i £ 0, since Vi(ut* , p*) = Vi(us* , p*) and ai(us*,p*) = cti(us*,p*), We have, 

^ \vi(us*,p*) - Ui(uj T ,p T )\ = ^2 M^*:/ *) - Vi{u) r ,p T )\ 
i^e asie 

< E r *' \ai(u*,p*) - Oi(u> T ,p T )\ 

= ■ [cti{uj* ,p*) - ai(u> T ,p T )] 

< E Vl ' E [^( U *^P*) - Pr)] , 

i^e ife 

where the first inequality is from Lemma[3lb). Since Xa=i otii}^*i P*) = X)i=i a i{ ul Ti Pr) = M, we have 

y] [ai(u* , p*) - ai(uj T ,p T )] = 22 h( w T,fr) - ai(ui*,p*)] < ^2 \ai(ui T ,p T ) - cti{us*,p*)\ 



(16) 
(17) 
(18) 
(19) 

(20) 



i to 



iee 



iee 



Note that, for i G 0, from (fT8l-([T9]l, 

\ai(oj T ,p T ) -ai(us*,p*)\ < cti(Wi{bl T ), 1) for i G 0. 



Substituting d22l-(l23l in (|2B . 



J2 \vi(u*,P*) ~ Vi{Ur, Pt )\ < • E "iW( & 0,r) ! 1) < E ^ E ^(^(^O.t), !)■ 

i^e iee i^e i=i 



(21) 



(22) 



(23) 



(24) 
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From ( l20b and (T24l i. the difference in ( [TBI can be bounded by, 



AT 



| V(r, M) - K(r, M)| < £ r 4 • Oi(Wi(&j iT ), 1) + ^> E "«(^( 6 o,r)> 1) 

i^e i=i 



N 



N 



i=l i=l 



Letting /(r) = ^ i=1 ai(Wi(&o T ), l), the lemma is established. ■ 

Appendix C 
Proof of Proposition!]] 

Define Lyapunov function L(q) = \ YliLi if - We consider the T-frame average Lyapunov drift AL(q[kT]) over the k-th 



frame, expressed as, 

AL(q[fcT])/T=iE[L(q[(fc + l)T])-L(q[fcT])| q[kT},n[kT} 

N N T-l 

<BT + J2 Hi*?} *[ fcT l ' t E [ E nlkT+il-at^ 



TvlkT] 



(25) 



N N , T-l 

'"'[kT+t] 

i=l i=l t=Q 

where B is a constant whose value is determined by the second moment of the arrival process |Q~8]. Because A lies within 
the stability region r — el, we have A + el G T. Therefore, for any vector q, 

N 



5>-(A l + e) <V(q,M) 



i=l 



where V(q, M) is defined in l@)-(|3). The Lyapunov drift (f25) now becomes, 



AL(q[kT])/T < BT-e^ qi {kT]+V{q[kT], M)-V? (q[kT], M-e/2), 



(26) 



where V^F (q[kT], M) is the T-horizon expected transmission rate achieved under the policy 4> T (q[kT], M), i.e.. 

N T-l 

i=l t=0 

We denote Z^(q,M) as the finite T-horizon expected number of transmissions, under the policy <f> T (q[kT], M), i.e., 

T-l N 



N T-l 

/ r T (q[fcr],M) = ^ ?i [^-E[x;7r4*r+t].^ (q[w1 ' J,f) [fcr+t^[A ! r|' 



^(q,M) = lE[^^af^ M 



The next lemma states that, as the length of the time horizon tends to infinity, the expected achieved rate in finite horizon 
asymptotically converges to infinite horizon achievable rate, and the expected number of transmissions converges to the 
value M. 

Lemma 4. For any M and k > 0, we have, uniformly over q, M, and the initial state ir[kT], 

(a) there exist positive constants C\ and Ci such that 

N 

V T {q,M)~V^(q,M)\ < (k + d exp(-c 2 T)) £ q t . 

i=i 

(b) there exist positive constants d\ and di such that 

Zj(q, M) -M < (k + dx exp(-d 2 T)). 
Proof: We first prove part (a). We define the random variable /^(q, M) as 



N ^ T-l 

^ , (q > M) = 5^g < -5^7r i [*T+t]- 



,<Mq,M) 



T 

i=l {=0 



[KT+t]. 
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Therefore, V^(q,M) = E[^(q,M)]. We let event fi := {|^(q,M) - V T (q, M)\ < nJ^lLi 111611 



E 

<E 



|^(q,M)-y T (q,M)| 
|/£(q,M)-V r (q,M)| 



Pr(fi) 



^ (q,M)-V r (q,M)| 

N 

<«E ® + E * ■ Pr (|/*r (q, AO - Mq M)| > «E *)" 



Pr(fi) 



AT 



AT 



(27) 



1=1 



i=l 



Note that 



\^(q,M)-V T (q,M)\ 



N T-l 1 T-l 

E * • S E ^ fcT + '] • 4 (Q ' M) \kT + 1] - I e + «] ■ 4 {QM) \kT + 1] 



i=l 
N 



< 



N N 1 T-l T-l 

E *\ E [f E ^[ feT + *] ■ < (q,M) t fcr + *] - ^ f E t fcT + *] 

i=l \ i=l t=0 °° t=0 



a 



<t> T {l,M) 



[kT + 1] 



N 



:^ (Zr ||r/(q,Af)-r/ T (q,A/)| 



where the inequality follows from Cauchy-Schwarz inequality and rj(q, M) and rjx(q)Af) are vectors with 



T-l 



ifc(q, M) = lim i ]T 7* [AT + i] ■ of (q,M) [AT + i], 



T-l 



Therefore, 



t=o 



2V 



i=l 



Pr(|^(q,M)-y T (q,M)| >«E») < Pr (||»j(q, Af) - r, T (q, A/)|| > «) 

U£x (q,M) - %(q,M)| > K /7V}) 

< E Pr ( I ^ T M ) - % (q, M) | > k/JV) • 



< Pr 

N 



(28) 



i=l 



Recall that, under the policy </> T (q, M), the belief states of different users are sorted, in the initialization phase, in the 
vector w. Therefore, each weighing vector q corresponds to a vector w, in which the belief states are ordered according to 
their queue-weight index values. Note that, over the truncated state space, the total number of different belief state orders 
is finite. Also note that, for all policies that corresponds to the same order of belief states, the belief value of each user 
evolves as the same finite state space, aperiodic Markov chain with one communicating clasfl Therefore, for each user i, 
under all policies that corresponds to the same order, there exist constants c\ and c\ such that, regardless of the initial belief 
state Il20l , 



T-l 



T-l 



Pr(|^E ' °?M ~ t 1 ™, J- E • a t W I > K I N ) < c i ex P(- c 2T). 

t=0 t=0 

Note that the number of users, as well as the number of orders of belief states, are finite. From (l28l . there exists constants 
ci and C2 such that, regardless of q and the initial belief state, 

N 

Pr(|^(q,M)-y T (q,Af)| >«E») < ci cxp(-c 2 T). 

i=l 

Substituting the above inequality in ( |27] i, part(a) thus holds. 

The proof of part (b) follows a similar approach as part (a). Here, the immediate reward is af[kT + t] instead of 
Tr?[kT + t] -aflkT + t}. M 



2 In case the threshold is at an index value shared by more than one users, we assume a fixed tie-breaking order is applied. 
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The next lemma bounds the difference between the reward function V T (q, M — e) and V T (q, M). 

Lemma 5. When t>tq, the difference between the expected transmission rate achieved under policy (/) T (q, M) and 
4> T (q,M — e) satisfies the following bound, 

N 

|^ T (q,M)-K(q,M-e)| < e^Qi- 

i=l 

Proof: 

Suppose, over the truncated state space under the weight q, the policies </> r (q, M) and <fi T (q,M — e) correspond to 
parameter pairs (w]^,^) and (ujl d _ e , p T M _ e ), respectively. 

For user i, we let yi(e) denote be the difference between activation time under policy </> T (<7, M — e) and <fr T (q, M), i.e., 
Ui(e) = oti(tjj T M , p T M ) — ai(u!^ I _ e , p T M _^). From Lemmata), we have yi(e) > 0, Vi. Since the difference of the total expected 
number of transmissions between the two policies is e, we have X)i=i 2/*( e ) = e - Recall that Vi(uj, p) denotes the expected 
throughput contributed by user i under a policy with threshold u) and randomization factor p. From Lemma EJb), we have, 

N 



\V T (q,M) - V T (q,M-e)\ <^ Pm) ~ "i^M-d PM-e) 

oti(u T M, Pm) - a t (uj T M _ e , p T M _ e ) 

1=1 

N N 



1=1 

N 

<£* 

i=l 
N 



i=l i=l 
N 

i=l 



We hence have proved Lemma [5] 
From Lemma [2]|5] the Lyapunov drift 



can be further bounded as follows, 



N 



AL(q[kT])/T <BT-e £ Ql [kT]+V (q[kT] , M)-V T (q[kT) , M) 

+ V T (q[kT],M)-V T (q[kT],M-e/2) 

+ V T (q[kT], M-e/2)-F T T (q[fcT], M-e/2) 



JY 



<BT- 



-BT- 



— e+/(r)+e/2 + (k + a exp (-c 2 T))j -J2<li[kT} 

i=l 

N 

-e/2+/(r)+(/«+ci exp( 



(29) 



i=l 



Since /(r) — Eti^f'oTil) can 8 et arbitrarily small as r becomes large, and cicxp(— c 2 T) approaches zero as T 
scales, and also noting that k can be arbitrarily small, the Lyapunov drift becomes negative as both r > r' > To, and T is 
large enough, e.g., T > T\. From Foster-Lyapunov stability criterion |fT9l , all the queues in the system are hence stable. 

Note that, under the queue weighted policy QWI T (T, M — e/2), the expected number of transmissions in the fc-th frame, 
ZT(q[kT],M - e/2), is bounded by Lemma g] as, 

Z^(q[kT], M — e/2) — (M — e/2) < (k + di exp(-d 2 T)). 

Therefore, there exists T 2 such that Z^(q[kT], M — e/2) < M regardless of q[fcT] and Tz[kT]. Therefore, the long term 
constraint on the average number of transmissions is satisfied. Letting T 1 = max{Ti, T 2 }, the proposition is established. 
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Appendix D 
Proof of Proposition[2] 

The proof of the proposition [2] follows the similar lines of the proof for proposition [2] For all arrival rates within the 
stability region A— g(r)l, under the T-frame queue-weighted index policy QWI T (T, M—g(r)/2) with r > tq, we have 
the following upper bound on the average Lyapunov drift over the fc-th frame similar to (|29l i, 

N 

AL(q[kT])/T <BT + \ - g(r)/2 + f(r) + [ K + Cl exp{-c 2 T)] \ ^g#T] 



JV 



-BT 



/(r)/2 + [k + ci exp(-c 2 T)] £>[fcT| 



i=l 



where the last equality holds because g(r) — 3/(r). For fixed r, by choosing « sufficiently small and T sufficiently large, 
the Lyapunov drift is negative. Therefore, the queues are stable according to the Foster-Lyapunov criterion. Also similar to 
the proof of proposition [2] the long-term constraint on the average number of transmissions is satisfied for sufficiently large 
T. Letting Tq be the value of frame length that guarantees the negative Lyapunov drift and also satisfies the constraint, part 
(a) of the proposition will hold. 

Note that we have g(r) = 3/(t). From Lemma|2] we have lim T ^oo g(r) — 0. Therefore, part (b) of the proposition also 
holds. 

Appendix E 
Proof of Lemma[3] 

We let Ui(u>,p) denote the expected transmission rate contributed by user i, under a policy with threshold parameter uj 
and randomization factor p. The expression of i/i(u>,p) is given as follows, 



P b l>,h + ( 1 -P) b 0.h + 1 



/ 3b o.h + ( 1 -P) b o,h + i + ( 1 -P*)(' i + 1 -P) 

pb' ,T + (l-p)bl 

pb' 0iT + (l-p)bl + (l-Pi)(r+l-p) 

PK 

rp(l-p,) + (l~P,)+pbl 



if u = W((bl h ) and b 0th < 6j iT , 
if w = W7(6o,r). 

if LJ = W[(bl), 

if u >wi(bi). 



We first prove part (a). We examine the values of ai(ui,p) and Vi{uj,p) for different threshold value of u>. 
Case (1). If uj = W*(b % Q h ) and b l h < b r , we consider the reciprocal of fi{uj, p), 

(l- Pi )(h + l-p) 



n-[vi(oj,p)] =1 + 



i(u,p)] 1 = 1 + 



0,h+l 
I- Pi 



b l o, h )-b, 



OM+l 



J 0,h+1 



J 0,h 



K, h+ i-(h + lW a , h+1 -b l , h ) 



P%.h+i ~ Km) 



J 0,h+1 



(l-p,)(h + l- p) 



Pbh, h + (1 - P)b l , h+1 + (1 - Pi)(h +l- P ) 



I -Pi 



J 0,h 



1 -Pi + b o,h+i + h (bi h -bi h+1 ) 



J 0M+ 



i) + bh. 



h+l 



(1-P) 



(30) 



(31) 



(32) 



When r > To, it can be shown that (via studying the derivative), the numerator b l h+1 — (h+l){b % h+1 — b l h ) inside (f3TT > 
is positive. Since the denominator in the parenthesis of (l3T1 l decreases with p, Vi(cj, p) increases with p. Also, the numerator 
of the second term inside the parenthesis of (l32t satisfies, 

1 - Pi + bl h+1 + Kb\ h - 6j jh+1 ) > 1 - Pl + bl h+1 + (h + l)% th - b l o h+1 ) > 0. 

Since the denominator in the parenthesis of ((32} decreases with p, we have that aii(uj,p) increases with p. 
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Case (2). If u = Wf ), we have 



n ■ [ui(u,p)] 1 = 1 + 



(l- Pi )(T+l- P ) 



j O,tJ 



1 - Pi 



&<_(r + !)(&*-&&_) 



(33) 



[oti(u,p)] 1 = 1 + 



(\- Pl )(r + l- p) 



pbl T + (l-p)bi + (l- Pi )(r + l-p) 



= 1 



0,r 



1 - 



l- Pi + 6i + r(te T -?>' s ) 



(34) 



p(6* )T -&i) + 6j + (l-p). 

When r > ro, it can be derived that the numerator 6* — (r + — b l Q T ) inside d33l is positive. Therefore, Vi(u),p) 
increases with p. From a similar proof as case (1), we have oa(uj,p) also increases with p. 

Case (3). If u> = W*(b l s ), 



n ■ [M^pT 1 = ^(r(i - Pi ) + + b\) 



[ati(w,p)] 1 = 



(1 + T Pi ) (1-p 



1 - Pi + b\ 

It is then clear from the above expressions that both cti(uj, p) and Vi(u), p) increase with p. 
Case (4). If uj > Wi(b l s ), since oti{u),p) = fi(uj,p) = 0, the statement holds trivially. 
We proceed to prove part (b) by first establishing the statement when lji — u)2 = c*>. 
Case (1). If u> = Wi(bi h ) and h < r, from (0 and (O we have that 

Vi{u),p) = n ai(b l oh ,p) + 

Case (2) If u = Wi(b\ T ), we have 

Vi(u,p) = r 



P b lh + C 1 - PKh+i + P^ h + 1 -P) 
pb\ T + (1 - P )b\ 



(35) 



p6J )T + (1 - p)b\ + (1 - p,-)(T + 1 - p) 

, \ -(i-jO 



Case (3) If & l = 6*, we have 



Vi(u,p) = n 



pbl T + (l-p)bl + (l-p t )(r + l-p) 

Kp 



(36) 



T P (l-p) + {l-p)+pbi 

-p(l-Pi) 



an(u,p) + 



(37) 



Tp(l-p) + {l-p)+pb\. 

Case (4). If u> > Wi(b\), since a.i(uj,p) = Vi{u),p) = 0, the statement holds trivially. 

Note that, in the above Case (l)-(3), the second summand in d35ll-(l3~7li decreases with the randomization parameter p. 
Since, from part (a), both ai(ui,p) and i/i(ui,p) increase with p, we have for any p\ > p2, 

< Vi(uj,pi) - Ui(u),p2) < n[ ai(w,pi) - ai(u),p 2 )]. 

Next consider the case when u>i ^ CJ2- Without loss of generality, we suppose uii < ui 2 - Note that, from for any belief 
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value b\ h , we have ai(Wi(6j )h ), 1) = 0) and ^(W^J, 1) = i/ i (Wi(&k fc _ 1 ),0). Therefore, 

pi)-Vi((J 2 , P2) 

= i/ i (wi,/>i)-i/ i (wi,0)+ ]T [i/ i (W i (& i ),l)-^(W i (6 i ),0)] +l/ i (W2,l)-l'i(W2,P3) 

6 i :o;i<Wi(fc i )<W2 

=i/ i (wi,Pi)-i/i(wi J 0)+ ]T h(^(6 l ),l)-^(^(6 l ),0)] +i/i(w2,l)-i/i(w2,pa) 

h i :wi<Wi(6 i )<W2 

<r; Q!i (cJi,pi)-Q! i (a;i,0)+ ^ [^(W^), 1) - en 0^(6*), 0)] + ai(wa, 1) - Oi(wa,pa) 

6 i :wi<W i (6 i )<w 2 

=fj Ot(a;x,pi)— Ot(w2,p2)l, 
which proves part (b). 



